KAFKA-10000: Integration tests (KIP-618)#11782
Conversation
7dfae7c to
196608f
Compare
|
Converting to draft until upstream PRs are reviewed. |
acb8fb3 to
3d65e79
Compare
9e017b8 to
8756ee2
Compare
There was a problem hiding this comment.
Added this line to hide the newly-introduced zombie fencing API from the OpenAPI spec that we started generating with #12067.
c520f26 to
26c6879
Compare
0b0d4c2 to
a751225
Compare
|
Given that all merge conflicts have been resolved and #11781 has already been approved, marking this as ready for review. |
4efd374 to
d322e99
Compare
There was a problem hiding this comment.
Added this check to clean up the flood of ERROR-level log messages that occurs when the task's producer is closed while there are still in-flight messages.
This issue was not specific to these integration tests or to KIP-618, but it clogged up the logs for these tests badly enough that a small tweak in the code base to address it seemed warranted.
There was a problem hiding this comment.
This was way too noisy at INFO-level.
|
@showuon @tombentley @mimaison anybody got time for this one? Just this and #11783 left before everything for KIP-618 is merged! 🎉 |
There was a problem hiding this comment.
I feel the missing Javadoc on this one!
There was a problem hiding this comment.
Haha, fair. Added a few paragraphs; hope it's enough
connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSourceTask.java
Outdated
Show resolved
Hide resolved
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
This doesn't really explain why this second call is necessary.
There was a problem hiding this comment.
Ack, updated to something less snarky.
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
8ed7592 to
2f4b1cc
Compare
There was a problem hiding this comment.
I got a bit confused looking at this. I wonder if we should make these custom config names as constants and maybe also name them similarly. I think custom.exactly.once.support may be clearer than exactly.once.support.level as I initially did not notice it was not the real configuration.
There was a problem hiding this comment.
Ack, did both (constants and renaming property).
There was a problem hiding this comment.
Maybe I'm boring but I'm not quite sure if we want to keep the emojis
There was a problem hiding this comment.
I can live with that 👍
There was a problem hiding this comment.
Should this be fail instead of null?
There was a problem hiding this comment.
Good catch--yes, it should
There was a problem hiding this comment.
I wonder if it would be better to produce a precise number of records and check we get that exact number. Here, if I understand correctly, the connector could have created duplicates and we would not notice it.
There was a problem hiding this comment.
This is true; we don't check for duplicates here. However, the test cases don't exactly do a lot to induce them, either. Even if exactly-once support were disabled for these, the conditions that they're run under are still green-path and shouldn't result in any duplicates.
We perform hard and soft rolling bounces in the connect_distributed_test.py::test_exactly_once_source system test introduced in #11783 and check for duplicates and record content there, which seems appropriate if we're aiming to test how resilient this feature is against real-world and sometimes-suboptimal conditions.
We could possibly add a utility method to assert that a collection of records produced by a source connector with n tasks has consecutive, non-duplicated sequence numbers for each unique task, but I'm not sure it's really worth the effort considering how much coverage we already get from the system test.
There was a problem hiding this comment.
I'm inclined to agree with @mimaison. It shouldn't be that much effort to verify the sequence numbers, and because most people are not in the habit of running the system tests having the assertion would provide much earlier warning of regressions.
There was a problem hiding this comment.
It actually is a surprising amount of work to verify sequence numbers since records can be spread across partitions and, as a result, polled out of order by a consumer. We could reduce the number of topic partitions that we write to in these cases to 1, or place an arbitrary limitation on the number of records produced by each MonitorableSourceTask instance before it stops producing, or shut down the connector and do a poll for the end offsets of the topic and read from beginning up to that point to get absolutely everything that's in the topic. But the first option would not be worth the decrease in coverage IMO and the last two are pretty complex and come with their own edge cases.
I do realize that system tests are pretty uncommon (I hardly run them myself), but do we really think there's risk of a regression that would be caught by these integration tests that wouldn't be caught by all the accompanying unit tests that we have?
If there's a simpler way to do things let me know, too; I could be missing an easy win here.
There was a problem hiding this comment.
Alright, one long weekend later, I've taken a shot at this. I don't know if I love how much complexity it's introduced into the tests here since it's unfortunately non-trivial to do a "read all" for a Kafka topic, but at least if the newly-introduced EmbeddedKafkaCluster::consumeAll method is accurate, we can reuse that logic elsewhere in our tests and not have to worry about solving this problem more than once.
There's still no assertions for in-order delivery, but we now have checks to make sure that no records are dropped or duplicated. LMKWYT
connect/runtime/src/test/java/org/apache/kafka/connect/util/clusters/EmbeddedKafkaCluster.java
Outdated
Show resolved
Hide resolved
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
533eae1 to
bba0d8e
Compare
tombentley
left a comment
There was a problem hiding this comment.
Left a couple more comments, but otherwise this LGTM. Thanks!
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
I'm inclined to agree with @mimaison. It shouldn't be that much effort to verify the sequence numbers, and because most people are not in the habit of running the system tests having the assertion would provide much earlier warning of regressions.
bba0d8e to
a6eb895
Compare
showuon
left a comment
There was a problem hiding this comment.
Made a pass. LGTM. Left some minor comments. Thanks.
...ime/src/test/java/org/apache/kafka/connect/integration/ExactlyOnceSourceIntegrationTest.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Should we mention something about Fibonacci numbers in comment here? I think it's not obvious what we're trying to achieve here.
There was a problem hiding this comment.
Fair enough, done 👍
a6eb895 to
c267490
Compare
c267490 to
54e443e
Compare
|
Failed tests are unrelated: |
Implements embedded end-to-end integration tests for KIP-618, and brings together previously-decoupled logic from upstream PRs.
Relies on changes from: